Information extraction model from Ge’ez texts
نویسندگان
چکیده
Nowadays, voluminous and unstructured textual data is found on the Internet that could provide varied valuable information for different institutions such as health care, business-related, training, religion, culture, history, among others. A alarming growth of fosters need various methods techniques to extract from data. However, exploring helpful satisfy needs stakeholders becomes a problem due overload via internet. This paper, therefore, presents an effective model extracting named entities Ge'ez text using deep learning algorithms. set with total 5,270 sentences were used training testing purposes. Two experimental setups, i.e., long short-term memory (LSTM) bidirectional (Bi-LSTM) make empirical evaluation split ratio 80% 20%, respectively. Experimental results showed proposed be practical solution building extraction (IE) systems Bi-LSTM, reaching validation, accuracy high 98.59%, 97.96%, 96.21%, The performance reflect promising compared resource-rich languages English.<!--[if gte mso 9]><xml>
 <o:DocumentProperties>
 <o:Version>16.00</o:Version>
 </o:DocumentProperties>
 <o:OfficeDocumentSettings>
 <o:RelyOnVML/>
 <o:AllowPNG/>
 </o:OfficeDocumentSettings>
 </xml><![endif]--><!--[if <w:WordDocument>
 <w:View>Normal</w:View>
 <w:Zoom>0</w:Zoom>
 <w:TrackMoves/>
 <w:TrackFormatting/>
 <w:DoNotShowComments/>
 <w:PunctuationKerning/>
 <w:ValidateAgainstSchemas/>
 <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
 <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
 <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
 <w:DoNotPromoteQF/>
 <w:LidThemeOther>MS</w:LidThemeOther>
 <w:LidThemeAsian>JA</w:LidThemeAsian>
 <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
 <w:Compatibility>
 <w:BreakWrappedTables/>
 <w:SnapToGridInCell/>
 <w:WrapTextWithPunct/>
 <w:UseAsianBreakRules/>
 <w:DontGrowAutofit/>
 <w:SplitPgBreakAndParaMark/>
 <w:EnableOpenTypeKerning/>
 <w:DontFlipMirrorIndents/>
 <w:OverrideTableStyleHps/>
 </w:Compatibility>
 <m:mathPr>
 <m:mathFont m:val="Cambria Math"/>
 <m:brkBin m:val="before"/>
 <m:brkBinSub m:val="&#45;-"/>
 <m:smallFrac m:val="off"/>
 <m:dispDef/>
 <m:lMargin m:val="0"/>
 <m:rMargin <m:defJc m:val="centerGroup"/>
 <m:wrapIndent m:val="1440"/>
 <m:intLim m:val="subSup"/>
 <m:naryLim m:val="undOvr"/>
 </m:mathPr></w:WordDocument>
 <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="false"
 DefSemiHidden="false" DefQFormat="false" DefPriority="99"
 LatentStyleCount="376">
 <w:LsdException Locked="false" Priority="0" QFormat="true" Name="Normal"/>
 Priority="9" Name="heading 1"/>
 SemiHidden="true"
 UnhideWhenUsed="true" 2"/>
 3"/>
 4"/>
 5"/>
 6"/>
 7"/>
 8"/>
 9"/>
 SemiHidden="true" UnhideWhenUsed="true"
 Name="index Priority="39" Name="toc Name="Normal Indent"/>
 Name="footnote text"/>
 Name="annotation Name="header"/>
 Name="footer"/>
 heading"/>
 Priority="35" Name="caption"/>
 Name="table figures"/>
 Name="envelope address"/>
 return"/>
 reference"/>
 Name="line number"/>
 Name="page Name="endnote authorities"/>
 Name="macro"/>
 Name="toa Name="List"/>
 Name="List Bullet"/>
 Number"/>
 Bullet Number Priority="10" Name="Title"/>
 Name="Closing"/>
 Name="Signature"/>
 Priority="1" Name="Default Paragraph Font"/>
 Name="Body Text"/>
 Text Continue"/>
 Continue Name="Message Header"/>
 Priority="11" Name="Subtitle"/>
 Name="Salutation"/>
 Name="Date"/>
 First Indent Name="Note Heading"/>
 Name="Block Name="Hyperlink"/>
 Name="FollowedHyperlink"/>
 Priority="22" Name="Strong"/>
 Priority="20" Name="Emphasis"/>
 Name="Document Map"/>
 Name="Plain Name="E-mail Signature"/>
 Name="HTML Top Form"/>
 Bottom (Web)"/>
 Acronym"/>
 Address"/>
 Cite"/>
 Code"/>
 Definition"/>
 Keyboard"/>
 Preformatted"/>
 Sample"/>
 Typewriter"/>
 Variable"/>
 Table"/>
 subject"/>
 Name="No List"/>
 Name="Outline List Name="Table Simple Classic Colorful Columns Grid 3D effects Contemporary"/>
 Elegant"/>
 Professional"/>
 Subtle Web Name="Balloon Grid"/>
 Theme"/>
 Name="Placeholder Spacing"/>
 Priority="60" Name="Light Shading"/>
 Priority="61" Priority="62" Priority="63" Name="Medium Shading Priority="64" Priority="65" Priority="66" Priority="67" Priority="68" Priority="69" Priority="70" Name="Dark Priority="71" Name="Colorful Priority="72" Priority="73" Accent 1 2 Name="Revision"/>
 Priority="34" QFormat="true"
 Paragraph"/>
 Priority="29" Name="Quote"/>
 Priority="30" Name="Intense Quote"/>
 3 Priority="19" Name="Subtle Emphasis"/>
 Priority="21" Priority="31" Reference"/>
 Priority="32" Priority="33" Name="Book Title"/>
 Priority="37" Name="Bibliography"/>
 Name="TOC Priority="41" Table Priority="42" Priority="43" Priority="44" Priority="45" Priority="40" Name="Grid Light"/>
 Priority="46" Priority="47" Priority="48" Priority="49" Priority="50" 5 Dark"/>
 Priority="51" 6 Colorful"/>
 Priority="52" 7 Priority="46"
 Light 4 Dark Priority="51"
 Priority="52"
 Name="Mention"/>
 Name="Smart Hyperlink"/>
 Name="Hashtag"/>
 Name="Unresolved Mention"/>
 Link"/>
 </w:LatentStyles>
 10]>
 <style>
 /* Style Definitions */
 table.MsoNormalTable
 {mso-style-name:"Table Normal";
 mso-tstyle-rowband-size:0;
 mso-tstyle-colband-size:0;
 mso-style-noshow:yes;
 mso-style-priority:99;
 mso-style-parent:"";
 mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
 mso-para-margin:0cm;
 mso-pagination:widow-orphan;
 font-size:10.0pt;
 font-family:"Times New Roman",serif;
 mso-ansi-language:EN-US;
 mso-fareast-language:EN-US;}
 </style>
 <![endif]--><p class="MsoNormal" style="text-align: justify;"><span lang="EN-US">Bi-LSTM;<strong style="mso-bidi-font-weight: normal;"><em style="mso-bidi-font-style: normal;"></em></strong></span></p><p lang="EN-US">Deep learning;</span></p><p lang="EN-US">Entity extraction;</span></p><p lang="EN-US">Ge’ez text;</span></p><span style="font-size: 10.0pt; font-family: 'Times Roman',serif; mso-fareast-font-family: Roman'; mso-ansi-language: EN-US; mso-fareast-language: mso-bidi-language: AR-SA;" lang="EN-US">Information extraction</span>
منابع مشابه
Information Extraction from Hindi Texts
The paper presents an information extraction system that takes input from Hindi texts and improves the information content retrieved by using anaphor/pronoun resolution mechanism. The information extraction system developed consists of three major modules: The language Parser, Resolution System and Information Extractor. The language parser used is HPSG (Head-Driven Phrase Structure Grammar) ba...
متن کاملTemporal Information Extraction from Korean Texts
As documents tend to contain temporal information, extracting such information is attracting much research interests recently. In this paper, we propose a hybrid method that combines machine-learning models and hand-crafted rules for the task of extracting temporal information from unstructured Korean texts. We address Korean-specific research issues and propose a new probabilistic model to gen...
متن کاملResources for Information Extraction from Polish texts
The paper presents a collection of resources developed for Information Extraction (IE) from Polish texts. In particular, we mention two IE platforms adapted to Polish and several IE applications built on top of one of them: named entity recognition, creation of terminology lexicons, and data extraction from medical texts.
متن کاملExploring the inference role in automatic information extraction from texts
In this paper we present a novel methodology for automatic information extraction from natural language texts, based on the integration of linguistic rules, multiple ontologies and inference resources, integrated with an abstraction layer for linguistic annotation and data representation. The SAURON system was developed to implement and integrate the methodology phases. The knowledge domain of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Indonesian Journal of Electrical Engineering and Computer Science
سال: 2023
ISSN: ['2502-4752', '2502-4760']
DOI: https://doi.org/10.11591/ijeecs.v30.i2.pp787-795